Multiplicative Adjustment of Class Probability: Educating Naı̈ve Bayes
نویسندگان
چکیده
Starting from the Näıve Bayes model, we develop a new concept for aggregating items of evidence in classification problems. We show that in Näıve Bayes, each feature variable contributes a multiplicative adjustment factor to the estimated class probability. We next introduce a way of controlling the importance of the feature variables by raising each adjustment factor to a different power. The powers are chosen so as to maximize the accuracy of estimated class probabilities on the training data, and their optimal values are obtained by fitting a logistic regression model whose explanatory variables are constructed from the feature variables of the classification problem. This optimization accomplishes more than what feature selection does for Näıve Bayes. We call this new model family the Adjusted Probability Model (APM). We also define a regularized version, APMR. Experiments demonstrate that APMR is surprisingly effective. Assigning different degrees of importance to the feature variables seems to remove much of the näıveté from Näıve Bayes.
منابع مشابه
Interval Estimation Naı̈ve Bayes
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with assumptions of conditional independence among features given the class, called naı̈ve Bayes, is competitive with state of the art classifiers. On this paper a new naive Bayes classifier called Interval Estimation naı̈ve Bayes is proposed. Interval Estimation naı̈ve Bayes performs on two phases. On the ...
متن کاملBounds for the Loss in Probability of Correct Classification Under Model Based Approximation
In many pattern recognition/classification problem the true class conditional model and class probabilities are approximated for reasons of reducing complexity and/or of statistical estimation. The approximated classifier is expected to have worse performance, here measured by the probability of correct classification. We present an analysis valid in general, and easily computable formulas for ...
متن کاملLearning Semi Naı̈ve Bayes Structures by Estimation of Distribution Algorithms
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier called naı̈ve Bayes is competitive with state of the art classifiers. This simple approach stands from assumptions of conditional independence among features given the class. Improvements in accuracy of naı̈ve Bayes has been demonstrated by a number of approaches, collectively named semi naı̈ve Bayes classi...
متن کاملBayesian Models to Assess Risk of Corruption of Federal Management Units
This paper presents a data mining project that generated Bayesian models to assess risk of corruption of federal management units. With thousands of extracted features related to corruptibility, the data were processed using techniques like correlation analysis and variance per class. We also compared two different discretization methods: Minimum Description Length Principle (MDLP) and Class-At...
متن کاملLearning Semi Naïve Bayes Structures by Estimation of Distribution Algorithms
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier called naı̈ve Bayes is competitive with state of the art classifiers. This simple approach stands from assumptions of conditional independence among features given the class. Improvements in accuracy of naı̈ve Bayes has been demonstrated by a number of approaches, collectively named semi naı̈ve Bayes classi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002